11/08/2018

Goals for this week

  • Experimental Design
  • Power analyses
  • Multi-factor ANOVA
  • Nested ANOVA
  • Factorial ANOVA
  • Analysis of CoVariance (ANCOVA)

Multifactor ANOVA

Multifactor ANOVA

  • Nested ANOVA or nested design
    • factors might be hierarchical - in other words nested - within one another
    • The sources of variance are therefore hierarchical too
  • The factorial ANOVA design is the most common experimental design used to investigate more than one treatment variable
    • In a factorial design every combination of treatments from two (or more) treatment variables is investigated.
    • The main purpose of a factorial design is to evaluate possible interactions between variables.
    • An interaction between two explanatory variables means that the effect of one variable on the response depends on the state of a second variable.

Multifactor ANOVA

Key difference between nested and factorial designs

  • Nested designs are hierarchical
    • often contain sub-replicates that are random, uncontrolled, nuisance effects
    • but the nested factors can be of interest too
  • Factorial designs are
    • all pairwise combinations,
    • and often involve all combinations of factor levels
    • when each factor is fixed interactions can be assessed
  • Completely nested designs therefore have no interaction terms, whereas factorial designs do
  • Mixed models can have a combination of fixed and random factors that are more complicated

Nested ANOVA

Nested ANOVA

Walking stick example

  • Example 1: Study of “repeatability” (simple nested design)
  • The walking stick, Timema cristinae, is a wingless herbivorous insect on plants in chaparral habitats of California.
  • Nosil and Crespi (2006) measured individuals using digital photographs.
  • To evaluate measurement repeatability they took two separate photographs of each specimen.
  • After measuring traits on one set of photographs, they repeated the measurements on the second set.

Nested ANOVA

Walking stick example

Each pair of dots represents the two measurements

Nested ANOVA

Walking stick example

Nested ANOVA

ANOVA Table of Results

Nesting Logic

Nesting equations

Nesting hypothesis tests

Nesting MS calculations

Nested ANOVA table of results

R INTERLUDE

Nested ANOVA

R INTERLUDE

Nested ANOVA

andrew_data <- read.table('andrew.tsv', header=T, sep=‘\t')
head(andrew_data)
  • There are four variables: ‘TREAT’, ‘PATCH’, ‘QUAD’ and ‘ALGAE’
  • The main effect factor is TREAT
  • Make a simplified factor called TREAT2, in which 0% and 33% are a level called “low” and 66% and 100% are “high”
andrew_data$TREAT2 <- factor(c(rep(“low”,40),rep(“high”,40))
  • The nested factor is PATCH - also need to turn this into a factor
andrew_data$PATCH <- factor(andrew_data$PATCH)

R INTERLUDE

Nested ANOVA

  • In this case, our response variable is ALGAE
  • Look at the distribution of ALGAE for the two levels of TREAT2 using boxplots based on the patch means, which are the replicates in this case.
andrew.agg <- with(andrew_data, aggregate(data.frame(ALGAE), 
                  by = list(TREAT2=TREAT2, PATCH=PATCH), mean)

library(nlme)
andrew.agg <- gsummary(andrew_data, groups=andrew_data$PATCH)

boxplot(ALGAE ~ TREAT2, andrew.agg)
  • Evaluate assumptions based on the boxplots
  • Is the design balanced (equal numbers of sub-replicates per PATCH)?

R INTERLUDE

Nested ANOVA

  • Run the nested ANOVA:
nested.aov <- aov(ALGAE ~ TREAT2 + Error(PATCH), data=andrew_data)
summary(nested.aov)
  • Do we detect an effect of TREAT2 (high vs low sea urchin density)?
  • Estimate variance components to assess relative contributions of the random factors
library(nlme)
VarCorr(lme(ALGAE ~ 1, random = ~1 | TREAT2/PATCH, andrew_data))
  • Calculate the % of variation due to between-treatment differences vs. due to among patches within treatment differences.
  • See pg. 302 in Logan if you need help.
  • What do these variance component estimates tell us???

Factorial Designs

Multifactor ANOVA

  • For example, Relyae (2003) looked at how a moderate dose (1.6mg/L) of a commonly used pesticide, carbaryl (Sevin), affected bullfrog tadpole survival.
  • In particular, the experiment asked how the effect of carbaryl depended on whether a native predator, the red-spotted newt, was also present.
  • The newt was caged and could cause no direct harm, but it emitted visual and chemical cues to other tadpoles
  • The experiment was carried out in 10-L tubs (experimental units), each containing 10 tadpoles.
  • The four combinations of pesticide treatment (carbaryl vs. water only) and predator treatment (present or absent) were randomly assigned to tubs.
  • The results showed that survival was high except when pesticide was applied together with the predator.
  • Thus, the two treatments, predation and pesticide, seem to have interacted.

Multifactor ANOVA

Two Factor Factorial Designs

Three Factor Factorial Designs

Factorial Designs

Number of Replicates

Model 1 factorial ANOVA

both main effects fixed

Model 2 factorial ANOVA

both main effects fixed

Model 2 factorial ANOVA

both main effects random

The mean squares for a factorial model

The F-ratios for a factorial model

Interpretation

significant main and interaction effects

Interaction plots

R INTERLUDE

2-by-2 fixed effect factorial ANOVA

rnadata <- read.table('RNAseq.tsv', header=T, sep='')
head(rnadata)
  • continuous response variable and two main effect categorical variables
gene <- rnadata$Gene80
microbiota <- rnadata$Microbiota
genotype <- rnadata$Genotype
boxplot(gene ~ microbiota)
boxplot(gene ~ genotype)
boxplot(gene ~ microbiota*genotype)

Fit the factorial linear model

two different ways to do the same thing

rna_aov <- aov(gene ~ microbiota + genotype + microbiota:genotype)
rna_aov <- aov(gene ~ microbiota*genotype)
  • Examine the fitted model diagnostics and the ANOVA results table
plot(rna_aov)
summary(rna_aov)
anova(rna_aov)
  • What are the general results of our hypothesis tests?
  • If there is an interaction, can we understand it by looking at the boxplots?

R INTERLUDE

2-by-3 fixed effect factorial ANOVA

  • Try the following code to produce an interaction plot for the response variable cell count.
  • In this case there are 2 genotypes and 3 treatment levels.
  • Download the IntPlot_data file and IntPlot_Example.R
  • Go through the R script, get a feel for what it’s doing, and try to produce and interpret the interaction plot.

Means tests for multifactorial ANOVAs

Means tests

factor level combinations in multi-factor ANOVA

  • The F-ratio test for a single-factor ANOVA tests for any difference among groups.
  • If we want to understand specific differences, we need further “contrasts”.
  • Unplanned comparisons (post hoc)
  • Planned comparisons (a priori)
  • Now we need to make ‘pseudo-factors’ that combine our levels of interest

Planned (a priori) contrasts

R INTERLUDE

2x2 Fixed-Effects Factorial ANOVA contrasts & interaction

continuous response and two main effect variables

rnadata <- read.table('RNAseq.tsv', header=T, sep='')
gene <- rnadata$Gene80
microbiota <- rnadata$Microbiota
genotype <- rnadata$Genotype

make new “pseudo factor,” combining genotype and microbiota

gxm <- interaction(genotype,microbiota)
levels(gxm)
boxplot(gene ~ gxm)

specify the following 2 contrasts

contrasts(gxm) <- cbind(c(2, -1, 0, -1), c(-1, -1, 3, -1))

R INTERLUDE

2x2 Fixed-Effects Factorial ANOVA contrasts & interaction

Fit the factorial linear model

rna_aov <- aov(gene ~ gxm)

Examine the ANOVA table, using supplied contrasts. Figure out the appropriate titles to give them.

summary(rna_aov, split = list(gxm = list('xxx'=1,'xxx'=2)))

What does the contrast summary tell you about the nature of the interaction?

Mixed effect models with unequal sample sizes

Attributes of mixed effects models

  • Linear models that include both fixed and random effects.
  • The model is split into fixed and random parts:
    • Fixed effects influence mean of the response variable Y.
    • Random effects influence the variance of Y.
  • There is a different error variance for each level of grouping.
  • Estimation and testing is based on restricted maximum likelihood, which can handle unequal sample size.
  • P-values for fixed effects are conservative when design unbalanced.
  • Implemented in the nlme & lme4 packages in R.

Assumptions of mixed-effects models

  • Variation within groups follows a normal distribution with equal variance among groups.
  • Groups are randomly sampled from “population” of groups.
  • Group means follow a normal distribution.
  • Measurements within groups are independent.

Hypotheses for Model 3 ANOVA Factorial Design With Mixed Effects

General R syntax for two factor factorial designs

R INTERLUDE

Variance components with 2 random factors using LME4

rnadata <- read.table('RNAseq.tsv', header=T, sep='')
head(rnadata)

variables excluding first 5 and last 5 observations

gene <- rnadata$Gene80[6:75] 
microbiota <- rnadata$Microbiota[6:75]
genotype <- rnadata$Genotype[6:75]
boxplot(gene ~ microbiota)
boxplot(gene ~ genotype)
boxplot(gene ~ microbiota*genotype)

Estimate the variance components using Restricted Maximum Likelihood (REML)

library(lme4)
lmer(gene ~ 1 + (1 | microbiota) + (1 | genotype) + (1 | microbiota:genotype))

Based on the REML sd estimates, what are the relative contributions of the factors to total variance in gene expression?

Analysis of Covariance (ANCOVA)

Brain & body size

neaderthals as compared to humans

Brain & body size

neaderthals as compared to humans

Brain & body size

neaderthals as compared to humans

ANCOVA

  • Analysis of covariance - mixture of regression and ANOVA
  • Response is still a normally distributed continuous variable
  • One or more continuous predictor variables (covariates)
  • Sometimes the covariates are of biological interest
  • Most often we want to remove unexplained variance
  • In this way they are similar to a blocking variable in ANOVA
  • Operationally, ANCOVA is regular ANOVA in which the group and overall means are replaced by group and overall relationships

ANCOVA

Adjusting for the covariate

ANCOVA

Adjusting for the covariate

ANCOVA

Linear model with two covariates

ANCOVA

Factor and covariate hypothesis tests

ANCOVA

F ratio tests

ANCOVA

Assumptions

  • The residuals are normally distributed
  • The residuals show homoscedasticity of variance
  • The residuals are independent of one another
  • The relationship between the response variable and each covariate is linear
  • Homogeneity of slopes among the groups
  • Similar covariate ranges among the groups

ANCOVA

Heterogeneous slopes

ANCOVA

Heterogeneous slopes

  • Problem - adjusting to a mean is difficult or impossible if the slopes are different
  • In essence, the samples for the groups come from two different populations
  • A test for homogeneity of slopes can be performed
  • The assumption is tested by looking for a significant interaction term between the categorical response variables and the covariate(s)

ANCOVA

Non-overlapping range of the covariate

R INTERLUDE

ANCOVA

  • Impacts of sexual activity on male fruitfly longevity
  • Data from Partridge and Faraquhar (1981)
  • Longevity of male measured in response to access to
    • no females
    • one virgin
    • eight virgins
    • one mated
    • eight mated
  • The male fruit flies also varied in size
  • The males were assigned randomly to each of the treatment levels, and then measured thorax length as a covariate

R INTERLUDE

ANCOVA

longevity_data <- read.table('longevity.csv', header=T, sep=',')
head(longevity_data)

Variables

long <- longevity_data$LONGEVITY
treat <- longevity_data$TREATMENT
thorax <- longevity_data$THORAX
  • check to see if the covariate should be included
boxplot(long ~ treat)
plot(long ~ thorax)

R INTERLUDE

ANCOVA

  • assess assumptions of normality and homogeneity of variance
plot(aov(long ~ thorax + treat ), which = 1)
  • †ry it again with a transformed response variable
plot(aov(log10(long) ~ thorax + treat ), which = 1)
  • visually assess linearity, homogenetiy of slopes and covariate range equality
library(lattice)
print(xyplot(log10(long) ~ thorax | treat, type = c("r", "p")))

R INTERLUDE

ANCOVA

  • formally test homogenetiy of slopes by testing the interaction term
anova(aov(log10(long) ~ thorax*treat))
  • formally test covariate range disparity by modeling the effect of the treatments on the covariate
anova(aov(thorax ~ treat))
  • FINALLY, set up contrasts, fit the additive model and visualize the results (pg. 459 and 460 of your Logan book)
  • Summarize the trends in a nice plot (pg. 461 of your Logan book)